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We consider the classical Gaussian white noise model 

dX £ {t) = f(t)dt + edW(t), t e [0, 1], (1) 



where f(t) is an unknown signal, W(t) is a standard Brownian motion, the noise level e > is 
known. We assume that the function / is continuous everywhere on [0, 1] except some unknown 
point r and depends on some unknown parameter 9, f{t) = f T (9,t). 

Let C : L^fO, 1] — > K be a given smooth functional of /. The goal of this paper is to compare 
Bayesian and maximum likelihood estimates of C[f] assuming that the function / is known up to 
the parameters r and 6. Let C{X £ ) be an estimate of C[f}. We will use the quadratic loss function 
and the mean squared risk for measuring the performance of the estimator: 

R £ (C,C) = E,, T (£(X £ )-£[/]) 2 . 

The model of observations (1) of the Wiener process with a discontinuous drift was first consid- 
ered by Ibragimov and Hasminski [10]. Assuming that the function / is known with an unknown 
discontinuity point r the authors studied asymptotic efficiencies of Bayesian and maximum likeli- 
hood estimates of r as e — > 0. Asymptotic mean-square error of an MLE of the discontinuity point 
r was calculated and an approximate value of the quadratic risk of a Bayes estimate was obtained. 
Later Rubin and Song [13] found an exact representation for the mean-square error of the Bayes 
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estimate of r in terms of Riemann's zeta function. According to these results, the Bayes proce- 
dure of estimating a change point is asymptotically more efficient than the maximum likelihood 
procedure with the asymptotic relative efficiency ^C(3) ~ 0.7397. 

This problem is closely related to the famous change-point problem considered by many authors. 
The literature on the change-point problem is vast, we refer the reader to the monographs of Csorgo 
and Horvath [7] on asymptotic theory in the change-point problem, of Brodsky and Darhovsky [5] 
on non-parametric methods, of Shiryaev [15] on optimal detection of change in distribution, and 
many references therein. We also refer to an excellent review article of Bhattacharya [1] that 
provides historical perspectives of the classical change-point problem. 

In spite of a long history of the change-point problem, the problem of estimating a smooth 
functional of a discontinuous signal was not considered. We construct two estimates of £ for 
model (1). We compare the asymptotic efficiencies of MLE and Bayesian estimate in the white 
noise model following the approach of [10]. 

The paper is organized as follows. In Section 2 we give a precise statement of the problem and 
obtain the asymptotic likelihood ratio process. In Section 3 the results on the relative efficiency 
of Bayesian and maximum likelihood estimates of the smooth functional are presented. Section 4 
contains the results for a sequence version of (1) with a simple signal representing the change in 
mean of a Gaussian sequence. In Section 5 we present simulation results for different signal-to-noise 
ratio and discuss both asymptotic and non-asymptotic aspects of the problem. 

2 Limiting likelihood ratio process 

It will be easier to work with a stochastic process Y(t) satisfying the stochastic differential equation 



where 6 = (61,62) an d r are unknown parameters that belong to some compact sets, 6 £ = 
01 x 02 C M 2 , t e T = [a, b]. We assume that 0<a<r<6<lso that that the change-point r 
is separated from and 1 and the change in the data happened within the interval [a, b}. Denote 
by A = f T (6, —t) — f T (6,+r) = fi(6\,r) — /2(#2, t) the jump size at the point r assuming that 
A/0. 

In fact, f T (6,-) depends on 6\ on [0,r] and on 62 on [r, 1]. Thus, by abuse of notation we will 
write ^(6{,t) meaning the value of the partial derivative of fi(x,t) with respect to x at x = 6{. 

Let T C L 2 [0, 1] be a linear space such that for any fixed parameters 6 and r the function 
f T (6, ■) £ J- satisfies the following condition. 
Condition F. Assume that 

(a) The functions fi(6i,t), i = 1,2, are continuous in t on [0, r] and [r, 1], and in 6i on 0j, respec- 



dY(t) = -f T (6,t)dt+ dW(t), t€[0,l], 

where W(t) is the standard Wiener process, VF(0) = 0, and e > 0. 
Assume that the functon f T (6,t) is defined as 



(2) 




(3) 



tively. 
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(b) For any £ in a neighborhood of 6, f T (x,t) has a bounded derivative ^-(6,t) for all t G [0, 1] 
except t = t. 

(c) fi(0,t), i = 1,2, are differentiable with respect to 6 at 0j's such that 

1 <9f- 

um-gWm m, •) - -J±(0i, -)5ii L2[0 ,i] = o. 

The problem is to estimate a smooth functional C[f T (6,-)] of the signal f T (6,-). Below the 
conditions on the functional C are specified. 

Condition L. Let 6 G 6 and r G T be fixed. The functional £ : J C L 2 [0,1] ^ 1 is Frechet 
differentiable at f T (6, •) G J 7 . 

Condition I/. The value £[/ T (6», •)] of the functional £ : F C L 2 [0, 1] -»• R at / T (6», •) is differen- 
tiable with respect to r in a neighborhood of 0. 

Let r^ le be an MLE of r based on observations (2). Let be a Bayes estimate of r based on 
observations (2), where r has some positive prior distribution on [0, 1]. The analysis of quadratic 
errors of these two estimates is based on the properties of the stochastic process 

V(t) = eMB(t)-\t\/2), (4) 

where B(t) is the two-sided Brownian motion defined by 

B(t )-l W ^ ( 5 ) 
[ ) ~ \ W 2 (-t), t < 0. (5j 

Here Wi(t), t > 0, i = 1,2 are independent standard Wiener processes with Wj(0) = 0. In fact, if 
A = t) — /2(^2, 7"), then the process F(A 2 i) is a limiting process for the likelihood ratio of r 

corresponding to the observations Y(t) [10]. 

Remind that # and r are defined on a compact set 9 = Oi x 62 C i 2 and on the interval 
T = (a, 6) C [0, 1], < a < b < 1, respectively. Following the approach of [10] we will fix the 
unknown parameters 9 = (61,62) and r and work with the local parameters h = (h±,h 2 ) and u. 
Introduce the normalizing sets £ = 0* x 2 , where 0* = e~ 1 (@i — &i), T £ = e~ 2 (T — r) such that 
h G £ and u G T e . Let Pg )T be the measure generated by the process (2) and Z £ 6T (h,u) be the 
likelihood ratio of 6 and r based on this process, 

Z ^'"> = dP 9 , T {Y{t)) - 

Lemma 1. Let H = H± x H 2 C e and U C T e be compact sets and condition F be satisfied. 
The distribution of the log-likelihood ratio process log Z £ T (h, u) as e — > converges uniformly over 
(h,u) G H x U to the distribution of the process 

log^° r (M) = \(Zl + Zl) - \l\ (h, - - \ll (h 2 - |)+logF(A 2 u), (6) 

where Z\ and Z2 are independent Af(0, 1), the process V(u) defined in (4) is independent of Z\ and 
Z 2 , and 



dt\ , h=( r 



2 



1/2 



dt 
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Proof. From Girsanov's theorem (see [10], Appendix II, Theorem 1) it follows that the like- 
lihood ratio for the measures generated by Y(t) with the parameters 9, r and 9 + eh, r + e 2 u 
satisfies 

dPe+eh,r+e 2 u - .. 1 



log£f, T (M) = log- 



(7) 



where 



Se (M) = -J' (r +£2u (e + eh,t)-r(e,t))dw(t), 



r e 2 (h, «) = 1 J\f +e \e + eh, t) - r(9, t)f dt. 

Using the same approach as in Lemma 7.2.1 of [10] and Condition F[c] it is not difficult to show 
that uniformly over the compact set H x U 



dt + h 2 2 



i: 



r 2 £ (h,u) = A 2 \u\ + h 2 £ §^(<M 

= A 2 \u\ +h\ll + h 2 2 ll + o(l), e^O. 
Consider now the stochastic part s £ (h,u) of the process. Let u > 0. We have 



dt + o{\), 



s £ (h,u)) = -J {fi{9 1 + eh 1 ,t)-f 1 {9 1 ,t))dW{t) + ^ J (f2(9 2 + eh 2 ,t)-f 2 (9 2 ,t))dW(t) 

T +e 2 u 

r+e 2 u 

+ \ J (fi(di + eh 1 ,t)-f 2 (0 2 ,t))dW(t). (8) 

T 

Following [10] from Condition F[c] we can obtain the weak convergence of first two terms of (8) to 

T 1 
h 1 Z 1 I 1 =h 1 J ^l(9 1 ,t)dW{t) and h 2 Z 2 I 2 =h 2 J ^l(9 2 ,t) dW(t), 

T 

respectively, where I\ and I 2 are defined in the statement of the lemma and Z\ and Z 2 are inde- 
pendent jV(0, 1). 

Indeed, consider the first term of (8). If 



T 

V £ {h 1 ) = - £ j (f 1 (9 1 +eh 1 ,t)-f 1 (9 1 ,t))-h 1 e^(9 1 ,t) 



dW(t) 



then EVe(/ii) = 0. From Condition F[c] it follows that for some constant C > and any h[,h'{ € H 



B(V £ (h[) - V £ {h'[)) 2 < C(h[ - h'[f jT 



dt. 
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Consequently, from Theorem 1.A.19 of Prokhorov in [10], p. 372 we obtain that for any S > 



limP e , T <^ sup \V e (h!)\ > 5} =0. 



Next, from the properties of the stochastic integral we have 



1 \ 1/2 



Hence, uniformly over h £ H the distribution of the first term in (8) converges to the distribution 
of h\Z\I\. Similarly, we can show the weak convergence of the second term of (8) to h 2 Z 2 I 2 
uniformly over H. 

Next, the last term in (8) can be written as 

r+e 2 u T+S 2 U 



\ J (fl(01+eh 1 ,t)-f 2 (9 2 ,t))dW(t) = ^ J (/!(#! +£/*!,*) -h r))dW{t) 

T T 

T+£ 2 U 

+ \ J (fi(0i,T)-f 2 (9 2 + eh 2 ,T))dW(t) 

T 

t+s 2 u 

+ \ f (f2(0 2 +£h 2 ,T)- f 2 (e 2 ,t))dw(t). 

T 

It can be shown, similarly to the proof in [10] that the first and the third terms converge to zero 
in probability uniformly over H x U. For the second term we have 

T+£ 2 U 

\ J (fi(0i,r)- f 2 (6 2 + eh 2 ,T))dW(t)= (/i(0i,r) - f 2 (9 2 + eh 2 ,r)^[W(r + e 2 u) -W(r)] 

' hiPur) - f2{e2 + eh 2 ,r)]w 1 (u)-^AW 1 (u). 



Note that the Wiener process Wi(ti)=£ -1 [W(t + e 2 u) — W(r)] is independent of Z\ and Z 2 , since 
three summands in (8) are independent. 

Thus, combining the estimates for the stochastic and non-stochastic terms, we obtain the con- 
vergence of the distribution of log Z| T for u > to the distribution of 

logZj iT (M) = A «',(<,.) + Zyhih + Z 2 h 2 I 2 - iA 2 |»| - h^ll + h 2 2 ll) 

= \& + A) - \H (*■ - f ) - \H (*. - f ) + *(*<«) - 1a M ) 

uniformly over H x U. The similar analysis for u < yields the statement of the lemma. ■ 
Remark 1. This result can be generalized to the case of multiple change-points. 
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3 Relative efficiency of two estimates of C[f] 



First, recall the problem of estimating the point of discontinuity r of the discontinuous signal 
/ from observations (1) that was studied by Ibragimov and Hasminskii [10]. For quadratic loss 
function, they compared asymptotic efficiencies of a maximum likelihood and a generalized Bayesian 
estimators of r. It turned out that asymptotically the ratio of quadratic risks of Bayesian estimate 
and the MLE of r does not depend on the function / with the discontinuity point r and that 
Bayesian estimator of r is more efficient than the MLE of r. 

As it was mentioned above, if A is the jump size at the point r, then V(A 2 t) is a limiting 
likelihood ratio process for estimating r. Denote the MLE and Bayesian estimate of r by T £ lle and 
t^, respectively. Let u be a local parameter for r with the normalization sequence e 2 . Define 

T m l c = T + £ 2 tt mlc , T b =T + £ 2 U b , 

where u m \ e is the point at which the limiting likelihood attains its maximum and u b is the generalized 
Bayesian estimate of u that corresponds to the limiting likelihood. Namely, we have 

t ,/a2n 1 t /■ / \ JmtV(AH)dt 1 LtV(t)dt 

» ml , = arg S V(A^) = ^arg ls F( t ), U|J = ij^^ = -i^-. (9) 

Then the asymptotic relative efficiency of f^ lc and rg coincides with the relative efficiency of the 
estimates u m ie and u b , 

E^-r) 2 _ E T (f b -r) 2 _ _E^_ = 
I " E T (ff nlc - rf ~ S E r (r mle - rf ~ E ~ ^ {W) 

Ibragimov and Hasminski [10] showed that Eu 2 ^ = 26/A 4 , but they stated that En 2 is hard to 
evaluate explicitly. Using computational methods, they obtained the following approximate value 
A 4 En 2 = 19.5 ± 0.5, and the efficiency k ps 0.73 ± 0.03. Later Rubin and Song [13] obtained the 
exact value of En 2 and the asymptotic relative efficiency of two estimates, 

«o = ^C(3) ~ 0.7397, 

that appears to be very close to the approximate value found in [10]. Here ( is Riemann's zeta 

oo 

function defined as = Yl n '' 

n=l 

We are interested in estimating the smooth functional C[f T (0, •)]. We will compare Bayesian 
and maximum likelihood estimates of C[f T (6, •)]. In fact, the problem is reduced to estimating the 
parameters 6 = (#1,6*2) and r. 

We can write an MLE C £ mlc of C[f T (6, •)] in terms of the local parameters h and u as 

where 

(h £ , u £ ) = arg max Z £ T (h,u). 

(h,u)ee e xT e 
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Define an MLE that corresponds to the limiting likelihood obtained in Lemma 1, 

^ m ie = £[r +£2 ^+eM] (11) 

where (h,u) = (hi,h 2 ,u) is the point at which the limiting log-likelihood process (6) attains its 
maximum. More precisely, 

1 „9 1 /. Zi x 



u = argmaxl/(A 2 u) = argmax { |A| \ B(u) — ) 1 . (13) 
uGR ugr IV 2 J J 

Obviously, En = and Eft* = 0, E/i 2 = l/Lf, % = 1, 2. We also know that ES 2 = 26/A 4 . 
Let be a Bayesian estimate of C[f T {9, •)] for quadratic loss function defined as 

C £ h = argmin J J \a~ £[f T+£2u (9 + eh, -)]) 2 Z| T (/i, dh 2 du. 

Define the generalized Bayesian estimate corresponding to the limiting likelihood process as 

= argmin J J (A - £[f T+£2u (6 + eh, -)}) 2 Z^ T (h,u) dh x dh 2 du. (14) 

K R 2 

Following the approach developed in [10] we can prove the following result. 

Lemma 2. Let 9 G G, r G T, where O is a compact subset of M 2 and T = [a,b] C [0,1]. Let 
Conditions F and L (L' ) be satisfied for all 9 G O, r G T . Then uniformly over 8xT 

«^E„, T (£? nlc -£[r(«,.)]) 2 

anc? 

E er (£^-£[/ T (0,-)]) 2 , 
lim ,V J 3 U V - n \ = 1. 16 

^E e , r (£0-£[r(^,.)]) 2 

Proof. The proof follows Theorems 1.10.1 and 1.10.2 of [10] and the continuity conditions F 
and L (L'). We do not go into details here and just give a brief overview of the conditions of two 
theorems. Lemma 1 guarantees the convergence of the likelihood ratio on compact sets such that 
Condition 2 of Theorems 1.10.1 and 1.10.2 is satisfied. 

Next, by using of the technique developed in [10] it can be shown that for 9 G 0, r G T = 
{a, b) C [0, 1], h', h" G e , u' , u" G T £ , and e < e 



E e , T [Zl T (h',u')} 1/4 -[Zl T (h",u")} 1/4 <^\\f T+£2u '(0 + eh',-)-r^\9 + eh'',-)\\i 



3 1 

^^2llJ ^tc»,V-J ^ TE " >VIIL 2 [0,1] 

<^(c 1 \\h'-h"\\ 2 +c 2 \ u , -u"\y. 
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It follows that for e < Eq 

P,, T jzf >T (M) > e^—idWhf + C 2 |A|H)) | < exp^-^dll^ll 2 + C 2 |A||«| 

These two relations form Condition 1 of Theorems 1.10.1 and 1.10.2 that is necessary for consistency 
of the estimates. 

Finally, Condition 3 of the theorems is on the unique maximum of the limiting likelihood 
process (6) in case of an MLE of C and on the unique minimum in (14). From the results in [10] on 
the properites of the process V(u) and concavity of log Zg T (h, u) in h it follows that these maximal 
(minimal) values are unique with probability 1. ■ 

Lemma 3. Let C £ mlc be an MLE and be a generalized Bayesian estimate of C[f T (9 , •)] , re- 
spectively. Assume that Conditions L and F are satisfied. Then, asymptotically as e — > 0, the 
asymptotic quadratic risks of both estimators have the same first order term, 

limE,,^- 2 ^ - £[/ T («,-)]) = limE„, rE - 2 (£j - t\f («.-)]) 



( d JLUr 



d6r 



lf T (o,-)] 



Proof. From Lemma 2 it follows that the risks (15) and (16) of £^ and have the same 
asymptotic behavior as the risks of -C^ie an< ^ ^b- Thus we have to calculate the risks of the limiting 
estimates C^ le and £° defined in (11) and (14), respectively. 

Remind that 9 = (0i,0 2 ) and r are fixed and the increments h = (foi,fo 2 ) and u of 9 and r 
belong to some compact sets, h <E H C @ £ , u <E U C T £ . 

Let us first show that the following Taylor series expansion holds true 

C [r+^(9 + eh, •)] = C\T(9, ■)] + J^LT^ Ol^i + W 2 [fT{e > ' )]£k2 + ° (£) ' £ ^ °' (17) 

Since C is Frechet differentiable, for f(t) = f T {9,t) G L 2 [0, 1] there exists a linear mapping 
A 9jT : L 2 [0, 1] ->• R such that 

ar+^e + eK .)] _ C [r{9, ■)] = a^[a/] + r [A/]n a/h L2[0i1] , (is) 

where 

A/(t) = f T+e2u (9 + eh, t) - f(9, t), ueU, h£H 

and lim ||r(A/)|| = for the remainder term r : L 2 [0, 1] — > R. Set 

l|A/|| L2 [o,i]^0 

A/(t) = (r+^(9 + eh, t) - r{6 + eh, t)) + (r(9 + eh, t) - r(9, t)) = A e f(t) + A T f(t). 

Since H and U are compact sets and / is continuous in t everywhere except t = r (Condi- 
tions F[a-b]), we have 

A e f(t) = f T+£2u (9 + eh, t) - f T (9 + eh, t) = r s (9, r, h, u, t) 
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where lim sup ||rg(^, r, /i, ti, ■) IIl.2 fo il = 0- Indeed, we have for t / r 
//. u&U 

A e /(t) = + eM) " /2(«2 + e/i2,t))l{|*-T| < £ 2 n}(l{« > 0} - 1{« < 0}), 

where the functions fi and f 2 are continuous on [0, 1]. 
From Condition F[c] it follows that for t ^ r 

A T /(t) = f T (0i + eh!,e 2 + e/i 2 , t) - f T (e u e 2 ,t) 

= &f }a^ eh ^ + df }a^ eh * + n(e/ii,t)e|/ii|l{t < r} + r 2 (eh 2 ,t)e\h 2 \l{t > r} 
where lim sup \\ri(ehi, -)||l 2 [o il = 0- Combining two formulas for A T f(t) and Agf(t) we obtain 

^ ' = — del — — 9^ — ' ' T ' u ' ' 

where sup ||r e (0, r, h, u, -)||l 2 [o,i] ^ °. e ^ °- Thus > II A /||l 2 [o,i] < II A e /|| L2 [o,i] + ll A t/||l 2 [o,i] -> 

as e -> uniformly over /i and u and, consequently, ||r[A/]|| = o(e) in (18). 

Substituting the obtained expansion in (18) gives the desired formula (17), where the linear 
mappings are defined as §§-[f T (0, •)] = ° A s,r : ®i -> R - 

First, we will calculate the asymptotic mean-square error of £^ llc defined in (11). From for- 
mula (17) and independence of hi and h 2 we obtain 

E„. r (£° m , - C\f'(0, -)]) 2 = e 2 (^-[/ t («, ■)]) + E '( J|[rC. 0]) \rhl + o( £ 2 ) 

since Ee^hf = 1/1? and Ee jT /ij = 0. 

Now calculate the risk of Bayesian estimate defined in (14). Using the Taylor series expan- 
sion (17) and formula (6) we have 

J j £[f T+£2u (6 + £h,-)]Z^ T (h,u)dhi dh 2 du 
j j Zg T (h, u) dh\ dh 2 du 

= C\T(P, •)] + ^[f r (e, + e^-ir(e, -)]^ + o(e), 

where Z^s are independent JV(0, 1). Calculating the mean square risk gives exactly the same 
asymptotic behavior as the one for the risk of £° llc . ■ 
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Remark 2. // Condition L' is satisfied and £ is twice differentiable w.r.t. 6i, we can calculate the 
second order terms of the asymptotic risks applying the facts that ESb = Eu m i c = 0, Em^ 1c = 26/A 4 , 
Bui = 16C(3)/A 4 . We have 

E eA £ mlc - c\rv, -)]) 2 = e 2 e j* (f \r{e, •)]) 2 + ^ (j£\rv, ■)]) 2 

i=l,2 i ^ i ' 

E eA a - c\rv, -)]) 2 = c 2 Ej2 (w t [fT{9 > " )] ) 2 + " 4l ^ " )] ) 2 

+ £4 E JK y +3 (ff[/^-)]) 2 +0 ( £ 4 ). 

// Condition L' is satisfied and C[f T (6, •)] is a function of r only, C[f T (6, •)] = g(r), we obtain the 
result (10) of Ibragimov and Hasminski as a corollary: 

Um E e , T (Cj-C[r(e,.)]f = Um E e ,M ~ 9(r)) 2 = ^ 

Eo, T (^ m i e - £lf T (o, W BeA^L, ~ air)? 13 
4 Estimation in the sequence model 

In this section we give explicit estimates for the problem of estimating a smooth functional in the 
equivalent sequence model. We assume that a very simple signal is observed, which is constant up 
to some moment of time r and equals zero afterwards. 

We observe the vector X = (Xi,...,X n ), where X^s are Gaussian random variables with 
distribution Pe i)T defined on the probability space (X,Bi,Pg itT ), 

X i = e i + e£ i , i = l,...,N. (19) 

The signal is constant up to some moment of time r, 6i = 61(i — r < 0). The parameters 
r G {1, . . . , n} and 9 are unknown, £j's are i.i.d. A/"(0, 1), e > is known. The goal is to estimate a 
smooth function L(9,t) of the signal. 

4.1 Overview 

The change-point problem for the change in mean in Bayesian set-up was considered by Chernoff 
and Zacks [6]. They studied the change in mean for a sequence of Gaussian r.v.'s and obtained a 
Bayes estimate of the difference in means before and after the change. This estimate was obtained 
under the assumption that the current mean and the jump size have normal prior distributions. 
The change-point was also assumed to be random with some arbitrary discrete prior. In their paper, 
Chernoff and Zacks compare Bayesian estimates of the current mean and the minimum variance 
linear unbiased estimates (MVLUE) when the signal-to-noise ratio is greater than 2. In particular, 
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they state that if there is exactly one change in the observations, the Bayes estimator is very 
efficient if the change takes place at the beginning of the sequence and it looses its efficiency when 
the change takes place very close to the last observation. In the case of at most one change Bayesian 
procedures are always better than MVLUE. Rubin [12] considered the change-point problem in the 
context of estimating discontinuities in multivariate densities. He discussed Bayesian and maximum 
likelihood approach to the problem and mentioned that the Bayes procedure with the uniform 
prior distribution on the unknown parameter r gives more efficient estimates than the maximum 
likelihood approach. 

A maximum likelihood estimate (MLE) of a change-point was first obtained by Hinkley [9] in the 
problem of estimating a moment of the change in mean of Gaussian data under the assumption that 
the jump size is small. Later an asymptotic distribution of the MLE of a change-point for the case 
of close normal means was derived by Bhattacharya and Brockwell [3]. Their result was generalized 
by Bhattacharya [2] to the case of a small jump size in a multidimensional parameter. Following 
[3] and [2] Ferger [8] proposed a class of estimates for a change-point based on [/-statistics for the 
case of small disorders in the distribution. Later Brodskii and Darhovskii [4] studied an asymptotic 
behavior of an estimate for the change-point in Gaussian sequence with unknown mean without the 
assumption that the difference between the means (the jump size) tends to zero. They proposed a 
family of estimates for the change-point based on the Kolmogorov-Smirnov statistics which includes 
an MLE. An asymptotic distribution of these estimates was derived and the corresponding testing 
problem was considered. 

Let v n be a cr-finite measure on cr-algebra B = B\ x • • • x B n and P@ T = P6» ljT x • • • x Pe n ,r- 
Then the joint density of X (likelihood) is given by 



(X)^f n (X;0,r) = (2^r^e W {-^(j2(X l -0) 2 + £ X ?) ) . (20) 




du n ^"'-^ ' / | 2e2 

Assuming that e = 1 in model (19), Bhattacharya and Brockwell [3] derived a limiting process 
for the likelihood ratio under the conditions that the parameter 6 is small, 9 = 5v~ l , where v n — > oo 
slower than n 1 / 2 and the length of the observed sequence n — > oo. According to their result, as 
n — > oo, the following weak convergence holds with respect to uniform convergence on compact 
sets, 

dPe+n-^h^+ulu w Z 2 1./, Z \ 2 A 1,,,, ,\ 

log _^_>___A(fc-_j + \5\[B{u)--\5\\u\y n^oo 

where Z is Af(0, 1) and B(u) is a two-sided Wiener process (5) independent of Z and A = lim r/n. 

n— toe 

Remark 3. Some general results on the behavior of the log-likelihood ratio and of the maximum 
likelihood estimate of the change point r can be found in Section 1.6 of [7j. In particular, Theo- 
rems 1.6.2 and 1.6.3 of [7] state that the asymptotic distribution as n — > oo of the likelihood ratio 
in the situation of a decreasing size of the change in means obtained in [3] differs from the one in 
the situation of a fixed change in mean. 

In our case the number of observations n is fixed and the change in mean (the jump size at r) 
9 is fixed. Let 6 G G, where G is an open subset of R and r = [na], where a £ A = (0, 1). Define 
the sets G £ = e _1 (Q — 9) and A £ = e~ 2 (A — a). The following lemma that is given without proof 
describes the asymptotic behavior of the likelihood ratio as e — > 0. 
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Lemma 4. Let X = (Xi, . . . ,X n ) be given by (19) and H, V be some compact subsets of @ £ and 
A £ , respectively. Then the following weak convergence holds uniformly over (h,v) G H x V as 
e -»• 0, 

l°g dP ^W- (X)4^ - \ [h - ^ + |g|Vn (ifo) - H) , ^0. (21) 

where Z is M(0, 1) independent of the two-sided Brownian motion B{v). 

4.2 MLE and Bayesian estimate of L(#, r) 

Let us find an MLE of L(9,t). The log-likelihood logp^(X;#,r) satisfies 

log^(A^,r) + ^log(2^) = ± g 

i=l i=T+l 
j=l i=l 

First, we maximize the log-likelihood with respect to 8 and replace 6 by its conditional MLE 
X T = i J2i=i Xi. Next, maximizing the obtained log-likelihood with respect to r we obtain an 
MLE of the change-point r, 



ff. lp = arg max < 

mlc Kk<n 



2^(j>) f=^)g> (22) 



where 



HP) 2 - 



^=2i2fcl>.M ■ (23) 

The estimate (22) of r for unknown change in mean of normal distribution was first obtained by 
Hinkley in [9]. An asymptotic distribution as e — > of this estimate was derived by Brodskii and 
Darkhovskii in [4]. 

Finally, an MLE of L(9, r) is given by 

-^mle = ^(0mle> r mle) = L( -X"r mlc j T mlc) • 

For example, if L(0, r) = Y7i=i ®i = # r > tnen -^mie = ^2 X i- 

i=i 

Remark 4. TVoie i/iai using (19) we can write 



r m ic = arg max 

Kk< 



i.{l^E6 + ^*Sr} + ^l{*>r}|} 
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Thus, to find a non- asymptotic risk of L m \ c we need to calculate the joint distribution of r m i e and 
Yll=i Ci- This problem is similar to calculation of the joint distribution of 



t = argmax 



eVil{t <t} + %l{t >t} + e^^- and W{t). 
\/t \Jt 



Let us now find a Bayesian estimate of L(9,t). Assume that 9 has a non-informative prior 
distribution A/"(0, a 2 ), where a 1 — > oo and r is uniformly distributed on the set {1, . . . , n}. Then 
the generalized posterior density of (6,r) is given by 

1 e W (-^(9-X T ) 2 + U T ) 
"'■ TW = ^ ^ <24> 

where is defined in (23). Chernoff and Zacks [6] used Bayesian approach with normal priors on 
9 and uniform prior on r to obtain an estimate of the mean of the observations after the change. 
In [11] under the same assumptions the posterior distribution (24) in the change-point problem for 
normal observations was calculated. 

Thus, we obtain the following Bayesian estimate of L(9,t), 

N AT 

^ = E/ L(0,T)ir(0,T\X)d8 = -j==Y,PTS F J L{9,r)e^(-^{9-X T ) 2 ) d9, (25) 

T=1 R r=1 R 

where 

IT. / n TJ . \ — 1 

e k I sr-^ e 1 



Pk 



For example, if L(9,t) = 9t = Yll=i then is a weighted sum of JQ's with weights p k 



71 oU k k 



k E /r E Xi 

L b = 2^p*L x * = — s-^- — • 

fc=i i=i £ 

Note that the Bayesian estimate of r for quadratic loss function is given by 



k=i k=i \k=i vft 



and the corresponding estimate for 6 is 

n 

91 = Y,PkX k . 



k=i 
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5 Simulation Study 



We studied the quadratic risks of Bayesian and maximum likelihood estimates of 

n 

L{e,r) = Y J O l ^e T . 

i=l 

10 4 simulations were made for n = 20 observations in model (19) with the values of 9 6 {0.5, 1,1.5,2}, 
for the change-points r = 3, 4, . . . , 17, 18, and the noise level e = 1. 
First, introduce the following notation for risk ratios, 



Mi"')'' E,, T (L« mfe -L)2' 

In Figure 1 the graphs of the empirical risk ratios k = k(t,9/e) and k = k(T,9/e) depending 
on t = 3, 4, ... , 17, 18 are presented for different values of the signal-to-noise ratio (SNR), 8/e = 
0.5, 1, 1.5, 2. For simplicity we assume that the signal 9 is positive. 

Remind that asymptotically as e — > the relative efficiency of the MLE of r with respect to 
the Bayes estimate of r is about 0.74, lim k(t, 9/e) = kq 0.7397. It means that the MLE of r is 

about 17% less efficient than Bayesian estimate t\, if the SNR 8/e is large. 

The examination of our numerical results leads to the following conclusions. 

(a) Large SNR, 6/e>l. 

It is clearly seen from Fig. 1(a), that for large 9/e the ratio k(t, 9/e) is close to its asymptotic 
theoretical value 0.7397. For 9/e = 2 the risk ratio n fluctuates between 0.72 and 0.77. For 
9/e = 1.5 we have 0.57 < k < 0.8. However, this is not the case for the ratio k(t,9 /e) of risks 
of estimating L presented in Fig. 1(b). In this case the behavior of the risk ratio depends 
both on 9/e and r. For large SNR 9/e = 1.5, 2 the relative efficiency is close to 1. It means 
that asymptotically both estimates of L have very close risks. 

(b) Small SNR, 9/e < 1. 

For small SNR and moderate or small values of r, the Bayes estimate Lb has to be preferred 
to the MLE estimate L m i c . For example, if 9/e = 0.5 and the change in the data takes place 
close to the beginning of the sequence t/N < 0.4, then the Bayes estimate Lb of L is almost 
twice more efficient than the MLE estimate L m \ e . If r is large, then the MLE of L has to be 
chosen instead of the Bayes estimate. At the same time, the Bayes estimate of r (Fig. 1(a)) 
is always more efficient than the MLE in the case of small SNR (9/e = 0.5, 1). Moreover, the 
smaller SNR is, the better is the behavior of Bayesian estimates comparing to the maximum 
likelihood estimates, both for estimating r and L. 

(c) Dependence on r. 

Fig. 1(b) shows that the Bayesian estimate of L is more efficient if the change takes place 
close to the beginning of the sequence. For example, for 9/e = 1 and t/N < 0.4 the Bayes 
estimate is more efficient than MLE, and vice versa, the MLE of L is more efficient for large 
values of r, t/N > 0.7. If the values of r are moderate, 0.4 < t/N < 0.7, then depending on 
the SNR we should prefer MLE or the Bayes estimate of L depending on the SNR. 
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Our simulation results are very similar to the results of Sen and Srivastava [14]. They made a 
comparative study of the Bayes and likelihood ratio tests for the problem of testing the hypothesis 
of "no change" in Gaussian data. It turned out that in the case of known mean 9 in the data the 
Bayes test is superior for t/N < 0.4, the LRT is superior for t/N > 0.75 and for 0.4 < t/N < 0.75 
the Bayes test dominates the LRT for small 9 and vice versa. 

The risks of MLE and Bayesian estimates of a smooth functional C have the same first order 
asymptotic term as e — > 0. Thus, from the viewpoint of asymptotic behavior there is no difference 
what approach to choose for estimation. 

For small values of the signal-to-noise ratio Bayesian procedure has much better performance 
than the ML procedure for a large part of values of the change-point r in case of quadratic losses. 
We cannot explain this fact theoretically, since the behavior of the risk ratio is only known for large 
SNR as e -> 0. 

Simulation studies shows that Bayesian procedures work remarkably better than MLE proce- 
dures in the case of small signal-to-noise ratio. On the other hand, asymptotically, both procedures 
show the same performance. We think that due to this fact Bayesian estimates have to be used in 
non-asymptotic framework. Unfortunately, in non-asymptotic setting, their theoretical risk prop- 
erties are very difficult to obtain. 

Acknowledgements. The author is grateful to an anonymous referee for constructive com- 
ments and suggestions that helped to improve the paper. 

References 

[1] P. K. Bhattacharya, Some aspects of change-point analysis, Change-point Problems, IMS Lecture Notes, 
23 (1994), pp. 28-55. 

[2] P. K. Bhattacharya, Maximum likelihood estimation of a change-point in the distribution of independent 
random variables: general multiparameter case, J. Multiva Anal., 23 (1987), pp. 183-208. 

[3] P. K. Bhattacharya and P. J. Brockwell The minimum of an additive process with applications to signal 
estimation and storage theory, Z. Wahrsch. verw. Gebiete, 37 (1976), pp. 51-75. 

[4] B. E. Brodskii, B. E. and B. S. Darkhovskii, Asymptotic Analysis of Some Estimates In the a Posteriori 
"Disorder" Problem, Theory Prob. Appl., 35 (1990), pp. 550-556. 

[5] B. E. Brodsky, B. E. and B. S. Darkhovsky, Nonparametric Methods in Change-Point Problems (1993) 
Kluwer Acad. Publ., the Netherlands. 

[6] H. Chernoff and S. Zacks Estimating the current mean of a normal distribution which is subject to 
changes in time, Ann. Math. Statist., 35 (1964), pp. 999-1028. 

[7] M. Csorgo and L. Horvath, Limit Theorems In Change-point Analysis (1997) Wiley Series in Proba- 
bility and Statistics. 

[8] D. Ferger, Change-point estimators in case of small disorders, J. of Stat. Plan. Inf., 40 (1994), pp. 33-49. 

[9] D. V. Hinkley, Inference about the change point in a sequence of random variables, Biometrika 57 (1970), 
pp. 1-17. 



15 



[10] I. A. Ibragimov and R. Z. Hasminski, Statistical Estimation: Asymptotic Theory (1981) Springer, New 
York. 

[11] A.F.S. Lee, S.M. Hcghinian, A shift of the mean level in a sequence of independent normal random 
variables - a Bayesian approach Technometrics, 19 (1977), pp. 503-506. 

[12] H. Rubin, The estimation of discontinuities in multivariate densities and related problems in stochastic 
processes, In: Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1, Univ. California Press, Berkeley 
(1961) pp. 563-574. 

[13] H. Rubin and K.-S. Song, Exact computation of the asymptotic efficiency of maximum likelihood esti- 
mators of a discontinuous signal in a Gaussian white noise, Ann. Statis., 23 (1995), pp. 732-739. 

[14] A. Sen and M. S. Srivastava, On tests for detecting change in mean, Ann. Statist., 3 (1975), pp. 98-108. 

[15] A. N. Shiryaev, Optimal Stopping Rules (1978) Springer- Verlag, New York. 



16 



Figure 1: Graphs of risk ratios k and k depending on r € {3,4, ... , 18} for N = 20 observations, 
e = 1, and different values of 9 = 0.5, 1, 1.5, 2. 
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